JUL-14-2003 05:03PM FROM-PENWI CK&WEST MOUNTAIN VIEW 



6509385200 



T-575 P. 003/000 F-668 



a 
o 



Pending Claim For: 



jq 

41 
"a 

< 

-4^ 



APPLICANTS: 
SERIAL NO: 

FILING DATE: 
TITLE: 

ATTY. DKT. NO.: 



P 0 1 fl-menc/snenf 

Ted E. Dunning and Bradley Kindig ( CZ/'hrO 5 /9f*n<2/«~ fi> 

infer /><*Sf 



09/848,982 

May 3, 2001 

Text Equivalencinc Engine 
22227-05479 




3\ 



6 
7 
8 
9 

JO 



1. A commiter-implemented method of Mvn1<f ncing from a string of 

cluiractWy comprising: 

^modifying the string of characters using a predetermined set of heuristics; 

^rTipari^ t' ri e modified String v ?; th ^ v v.->\yn ? \ r >j>f c; rr-r' ~ '~ r. " L" 
locate a match; 

responsive to not finding a match, forming a plurality of sub-strings of 

racters from the string of characters; and 
using aJa information retrieval technique on the sub-strings of characters to 
determine a known string of characters equivalent to the string of 
characters. 



2. The method oKclaim 1, wherein the information retrieval technique further 



2 comprises: 



weighting the suB^strings; 

scoring the known Wing of characters; and 

retrieving information associated with the known string of characters with 
the highest score 



/ 3. The method of claim 2, further comprising, responsive to the highest score 

2 being greater than a first threshold, automatically accepting the known string of 

3 characters as an exact match. 
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\4. The method of claim 2, further comprising, responsive to the highest score 

2 being ihss than a second threshold and greater than a first threshold, presenting the 

3 known storing °* characters to a user for manual confirmation. 

/ 5. The method of claim 2, further comprising, responsive to the highest score 

2 being less than a second threshold and greater than a third threshold, presenting the 

3 known string <V characters to a user to select the equivalent sf-ring of characters. 

/ 6. The rnetYiod of claim 1, wherein the sub-strings of characters arc 3-grams. 

J 7, The rnethoM of claim 1, wherein the string of characters is selected from the 

2 group consisting of a hnng title,, a song ?rtist, an album name, 3 book rm wthor's 

3 name, a book publisheAa genetic sequence, and a computer program . 

J 8. The method of cl\im 1, wherein the predetermined set of heuristics comprises 

2 removing whitespace from tiie string of characters. 

/ 9. The method of claim %. wherein the predetermined set of heuristics comprises 

\ - 

2 removing a portion of the string bf characters. 

/ 10. The method of claim 1, wherein the predetermined set of heuristics 

2 comprises replacing a symbol in the sking of characters with an alternate representation 

J for the symbol. 

/ 11. The method of claim 1 further comprising storing an indication that the 

2 string of characters is the equivalent of the Known string of characters. 



1 12, A computer implemented system fo\ text equivalencing from a string of 

2 characters comprising: 
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a heuristics module for modifying the string of characters using a 

predetermined set of heuristics; 
a Comparator module, coupled to the heuristics module, for comparing the 

modified string with a known string of characters in order to find a 
match; 

a sub-sodng formation module, coupled to the comparator module, 

jonsive to not finding a match, for forming a plurality of sub- 
strings of characters from, the string of characters; and 
an information retrieved module, coupled to the sub-string formation module, 
tor per&rming an .information retrieval technique on Lhe sub^uiugs u£ 
character^ to determine a known string of characters equivalent to die 
string of characters. 
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13. The system, of clairr\12, wherein the information retrieval module further 
comprises: 

a weight module for weighting the sub-strings; 
a score module for scoring the known string of characters; and 
a retrieval module, coupled to the weight and score modules, for retrieving 

information associated with the known string of characters with the 

highest score. 



14. The system of claim 13, furtheX comprising an accept module, coupled to the 
retrieval module, for accepting the information retrieved as an exact match for the 
highest score greater than a first threshold. 



15. The system of claim 13, further composing an accept module, coupled to the 
retrieval module, for presenting the information retrieved to a user for manual 
confirmation for the highest score less than a first Upreshold and greater than a second 
threshold. 
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/ u6. The system of claim 13, further comprising an accept module, coupled to the 

2 retrievaNnodule, for presenting the information retrieved to the user as a set of options 

Q. , \ 

Q 3 for a user fo select for the highest score less than a second threshold and greater than a 

4 third threshold. 

O \ 

C% i 17. The V$tem of claim 12, wherein the sub-strings of characters are 3-grams. 

-= \ 

O J 18. The sysW> of claim 12, wherein the string of characters is selected from the 

2 group consisting of ^o:ig title, a song artist an album name, a hook title, and auihur's 

«4_ s name, a book publisher, a genetic sequence, and a computer program. 

<8 \ 

' 19. The system of \laicn 12, wherein the predetermined set of heuristics 

2 comprises removing whiteapace from the string of characters. 

/ ■ 20. The system of claiixva2 / wherein the heuristics module comprises a removal 

2 module for removing a portion of the string of characters. 

; 21. The system of claim 12, wherein the heuristics module comprises a 

2 replacement module for replacing a symbol in the string of characters with an alternate 

3 representation for the symbol. \ 

; 22. The system of claim 12 further comprising a database update module for 

2 storing an indication that the known string ol characters is the equivalent of the known 

3 string of characters. \ 

. / 23, A computer-readable medium comprising computer-readable code for 

2 performing text equivalencing from a string of characters comprising: 

3 computer-readable code adapted to modifAthe string of characters using a 
* predetermined set of heuristics; \ 
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computer-readable code adapted to compare the modified string with a 

known string of characters in order to locate a match; 
vcomputer-readable code, responsive to not finding a match, adapted to form a 

plurality sub-strings of characters from the string of characters; and 
coinputer-readable code adapted to use an information retrieval technique on 
the sub-strings of characters to determine a known string of characters 
equivalent to the string of characters. 

24. The comwuter-readable medium of claim 23, wherein the information 
retrieval technique fiAther comprises: 

computer-readable code adapted to weight the sub-strings; 
computer-readable code adapted to score the known string of characters; and 
computer-readalMe code adapted to retrieve information associated with the 
known strirW of characters with the highest score. 

25. The computer-readable medium of claim 24, further comprising computer- 
readable code, responsive to the highest score being greater than a f Lrst'threshold, 
adapted to automatically accept the* known string of characters as an exact match. 



26. The computer-readable meclium of claim 24, further comprising computer- 
readable core, responsive the highest scVre being less than a second threshold and 
greater than a first threshold, adapted to present the known string of characters to a 
user for manual confirmation- 



27. The computer-readable medium of claim 24, further comprising computer- 
readable code, responsive to the highest score boing less than a second threshold and 
greater than a third threshold, adapted to presentVhe known string of characters to a 
user to select the equivalent string of characters. 
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The computer-readable medium of claim 23, wherein the sub-strings of 
characters are 3-grams. 

29. The computer-readable medium of claim 23, wherein the string of characters 
selected frornV group consisting of a song title, a song artist, an album name, a book 
title, an author \ name, a book publisher, a genetic sequence, and a computer program. 

30. The conVmter-ieadable medium of claim 23, wherein the predetermined set 
of heuristics cnmpri«W, removing vvhitespace from the string of' characters. ' 

31. The computer-readable medium of claim 23, wherein the predetermined set 

of heuristics comprises r^Y^oving ? portion of the , r trin«r of cbpr.-irtrrs. 



; 32. The method of cl^im 23, wherein the predetermined set of heuristics 

2 comprises replacing a symbolMn the string of characters with an alternate representation 

3 for the symbol. 
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33. The computer-readableVnedium of claim 23 further comprising updating the 
known string of characters to indica\e the string of characters is the equivalent of the 
known string of characters. 

34. A computer-implemented sysf^m for performing text equivalencing from a 
string of characters comprising: 

a modifying means for modifying fche string of characters using a 

predetermined set of heuristics; 
a comparator means for comparing thamodlfied string with a known string 

of characters in order to locate a match; 
responsive to not finding a match, a formation means for forming a plurality 

sub-strings of characters from the stqjng of characters; and 
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9 information retrieval means for determining a known string of characters 

io \ equivalent to the string of characters. 

; 35. The system of claim 34, wherein the information retrieval means further 

2 comprises: \ 

3 a weight means for weighting the sub-strings; 

4 a score meante for scoring the known string of characters; and 

5 a retrieval n\c^*tS for retrieving information associated with tliGr known siring 

6 of characters with the highest score. 
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